Improve nearest-neighbor query perf by Zaczero · Pull Request #708 · pytroll/pyresample

Zaczero · 2026-03-02T00:16:40Z

query_no_distance:
runtime: 0.01383s -> 0.01279s (-7.5%).
rss: 221096KB -> 204204KB (-7.6%).

_verify_input_object_type numpy array:
runtime: 0.02705s -> 0.02002s (-26%).
rss: same

_compute_radius_of_influence nan case:
runtime: 19.71s -> 2.10s.
rss: same

Tests added
Tests passed

codecov · 2026-03-02T12:11:52Z

Codecov Report

❌ Patch coverage is 89.13043% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.66%. Comparing base (4da28b0) to head (b23310e).
⚠️ Report is 40 commits behind head on main.

Files with missing lines	Patch %	Lines
pyresample/future/resamplers/nearest.py	68.75%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #708      +/-   ##
==========================================
- Coverage   93.67%   93.66%   -0.02%     
==========================================
  Files          89       89              
  Lines       13621    13707      +86     
==========================================
+ Hits        12759    12838      +79     
- Misses        862      869       +7

Flag	Coverage Δ
unittests	`93.66% <89.13%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

djhoese

Nice job. Thanks for finding this. I've always disliked the indexing done in the query so I'm glad to see it cleaned up. I had a couple questions, but otherwise the majority of this looks good to me.

Otherwise could you check pre-commit's failures? Thanks.

pyresample/future/resamplers/nearest.py

djhoese · 2026-03-02T16:46:11Z

pyresample/future/resamplers/nearest.py

+        if np.isnan(src_res):
+            radius_of_influence = dst_res
+        elif np.isnan(dst_res):
+            radius_of_influence = src_res
+        else:
+            radius_of_influence = max(src_res, dst_res)


Could you explain why this is better? Because it prefers the source resolution?

It works just the same and avoids the all-nans warning print, which is quite expensive in Python. So it benefits only the worst-case scenario.

pyresample/future/resamplers/nearest.py

djhoese · 2026-03-04T11:56:41Z

In your original description you talked about the performance of _verify_input_object_type. How does this PR affect that? Where is this function?

djhoese · 2026-03-04T11:57:49Z

Oh and what was your test case for your timing? I'm surprised by how fast the query stuff was, but even more by how slow the NaN case was for the radius of influence stuff.

djhoese · 2026-03-04T11:59:29Z

One last thing: I'm not ignoring your other pyresample and satpy PRs, but I'm still catching up on real work after being out for a couple days and I'm trying to fully understand the changes before commenting. Thanks again for all these changes/fixes/improvements.

Zaczero · 2026-03-15T11:13:57Z

In your original description you talked about the performance of _verify_input_object_type. How does this PR affect that? Where is this function?

pyresample/future/resamplers/nearest.py:496 replaces data.copy() with data.copy(deep=False) before swapping .data to a dask array. That avoids a deep copy of the underlying NumPy buffer.

Oh and what was your test case for your timing? I'm surprised by how fast the query stuff was, but even more by how slow the NaN case was for the radius of influence stuff.

I don't remember now, but I was measuring the specific code paths in isolation, not in a bigger system. I like efficient software.

One last thing: I'm not ignoring your other pyresample and satpy PRs, but I'm still catching up on real work after being out for a couple days and I'm trying to fully understand the changes before commenting. Thanks again for all these changes/fixes/improvements.

Np, I was delaying too because I started working on the data I got processed 😄.

djhoese · 2026-03-17T20:59:46Z

I don't remember now, but I was measuring the specific code paths in isolation, not in a bigger system. I like efficient software.

Were you running things multiple times? I'm unable to get my laptop with a real world case to perform in a consistently better way. I'll try a few more things to sanity check...

djhoese · 2026-03-17T21:05:40Z

I don't remember what data you said you're processing, but I'm taking ABI L1b data (G16) and resampling it using nearest neighbor to the Satpy builtin "brazil" area. With current pyresample main I first got 11.5s so I switched to this branch and got 10.87s so I thought "awesome, faster". Then I ran it over and over again and got numbers from 11.63s to 11.86s. Then I went back to main and got 11.11s to 11.83s.

from datetime import datetime
from satpy import Scene
from glob import glob

scn = Scene(reader="abi_l1b", filenames=glob("/data/satellite/abi/20180623/OR_ABI-L1b-RadF-M3C*G16*.nc"))
scn.load(["C02"])

first = datetime.now()
new_scn = scn.resample("brazil", resampler="nearest")
new_scn["C02"].compute()
#new_scn.save_datasets(filename="C02_brazil.tif")
second = datetime.now()

print((second - first).total_seconds())

Zaczero · 2026-03-25T12:56:12Z

In my PR body, I performed 3 distinct measurements on the individual methods directly, not the full pipeline. I do confirm that the improvements are still reproducible. I've also noticed I implemented one more valuable specialization to query_no_distance that brings 6% performance improvement in the neighbors=1 common case.

djhoese · 2026-03-26T11:51:01Z

Thanks. I've been off of work the last week or so but am catching up on things today. I'll try to review this when I get a chance. And yeah if you're testing the individual functions it makes sense to see these low-level improvements but then I not see them on the high-level when the majority of time is spent building the KDTree and the actual query time.

mraspaud

LGTM, thanks for taking the time to simplify the processing!

djhoese reviewed Mar 2, 2026

View reviewed changes

Zaczero force-pushed the nearest-qnd-perf branch from a17a10b to 2d09c3b Compare March 3, 2026 22:18

Improve nearest-neighbor query perf

b23310e

Zaczero force-pushed the nearest-qnd-perf branch from 2d09c3b to b23310e Compare March 25, 2026 12:52

djhoese added the enhancement label Mar 26, 2026

djhoese assigned mraspaud and djhoese Mar 26, 2026

djhoese requested review from djhoese and mraspaud March 26, 2026 11:49

mraspaud approved these changes Mar 26, 2026

View reviewed changes

mraspaud merged commit b33c1db into pytroll:main Mar 26, 2026
25 of 26 checks passed

Zaczero deleted the nearest-qnd-perf branch March 26, 2026 14:10

Conversation

Zaczero commented Mar 2, 2026

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

djhoese left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

djhoese Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Zaczero Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

djhoese commented Mar 4, 2026

Uh oh!

djhoese commented Mar 4, 2026

Uh oh!

djhoese commented Mar 4, 2026

Uh oh!

Zaczero commented Mar 15, 2026

Uh oh!

djhoese commented Mar 17, 2026

Uh oh!

djhoese commented Mar 17, 2026

Uh oh!

Zaczero commented Mar 25, 2026

Uh oh!

djhoese commented Mar 26, 2026

Uh oh!

mraspaud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 2, 2026 •

edited

Loading